Add DSv4 FP8 H200 SGLang MTP benchmark#1265
Conversation
Mirror of dsv4-fp8-h200-sglang plus EAGLE speculative decoding flags (--speculative-algorithm EAGLE, --speculative-num-steps 3, --speculative-eagle-topk 1, --speculative-num-draft-tokens 4). The (3,1,4) chain matches the dsv4-fp4-b300-sglang-mtp TP-only path. Same image, runner pool (h200-dgxc), and search space as the non-MTP entry. The launcher resolves the new spec-decoding: mtp matrix entries to benchmarks/single_node/dsv4_fp8_h200_sglang_mtp.sh via the framework-tagged + _mtp suffix lookup that landed with #1264. run_benchmark_serving uses --dsv4 (DSv4-Pro chat framing) per the AGENTS.md rule that all MTP scripts must benchmark against chat-formatted inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you
PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you
PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| - "EAGLE speculative decoding chain: --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4" | ||
| - "run_benchmark_serving uses --dsv4 (chat-formatted prompts) per the AGENTS.md MTP rule, since EAGLE acceptance regresses on raw random tokens" | ||
| - "Search space mirrors the non-MTP H200 SGLang entry: TP=8 EP=1, conc 1 and 4-64 for both 1k1k and 8k1k, with spec-decoding: mtp" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1265 |
There was a problem hiding this comment.
🟡 The new dsv4-fp8-h200-sglang-mtp entry (perf-changelog.yaml:2124) has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX — the "XXX" placeholder was never replaced with this PR's real number (#1265). Despite the PR description claiming the link was backfilled, the committed file still has the placeholder; please update it to /pull/1265 to match the convention used by every other entry in the file.
Extended reasoning...
What the bug is. The new entry added by this PR for the dsv4-fp8-h200-sglang-mtp config in perf-changelog.yaml ends with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. "XXX" is clearly a templated placeholder — every one of the ~150 other entries in this same file uses a concrete PR number, and the PR's own description even claims "PR-link backfilled to #1265". The backfill never happened.\n\nHow it manifests. Anything that consumes perf-changelog.yaml and follows pr-link will hit https://github.com/SemiAnalysisAI/InferenceX/pull/XXX, which is not a valid PR. GitHub renders this as a 404. Any internal changelog tooling, dashboard, or script that crawls these links to surface release notes will silently produce a broken hyperlink for this one entry.\n\nStep-by-step proof. (1) The PR description states "perf-changelog.yaml updated; PR-link backfilled to #1265." (2) The pre-loaded modified-files content for perf-changelog.yaml literally ends with the line pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. (3) Independently confirmed by running git show HEAD:perf-changelog.yaml | tail -1 against commit 2f28e59 — it returns pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. (4) The PR's own number is #1265 (per the metadata at the top of the timeline), and the immediately-prior entry in the same file correctly uses /pull/1264. The intended value is unambiguously 1265.\n\nAddressing the refutation. A verifier objected that get_pr_diff shows + pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1265 and concluded the merged result will be correct. That is contradicted by directly inspecting the committed tree: git show HEAD:perf-changelog.yaml on the merge candidate (2f28e59) shows /pull/XXX, not /pull/1265. Whatever the diff-fetching tool returned does not match what is actually on the branch — the on-disk file and the committed object both carry the placeholder. Since GitHub merges what's in the tree, not a synthesized diff, the placeholder is what will land on main if this PR is merged as-is.\n\nWhy existing review didn't catch it. It's a one-line change at the very tail of a 2000+ line YAML file, and the surrounding lines look intentional and well-formed. The PR description even asserts the backfill was done, which discourages a closer look. There's no schema check on pr-link values, so no CI signal.\n\nImpact and severity. No runtime impact — perf-changelog.yaml is documentation, not consumed by the benchmark pipeline. The blast radius is limited to whatever tooling renders this changelog. This is a trivial one-character fix (XXX → 1265), and easy to make before merging.\n\nHow to fix. Replace the last line of perf-changelog.yaml with:\n\nyaml\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1265\n
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25263257378 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25264520177 |
AGENTS.md requires new perf-changelog entries to be appended to the end of the file (oldest at top, newest at bottom). The original commit prepended the new entry above PR #95; move it after the current last entry (PR #1265) to satisfy the convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…1267) * Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep) - New `kimik2.5-int4-b300-vllm` config with the corresponding `benchmarks/single_node/kimik2.5_int4_b300.sh` launch script (mirrors the existing INT4 B200 vLLM recipe; the upstream vLLM Kimi-K2.5 recipes page does not yet ship B300-specific tuning). - Image: `vllm/vllm-openai:v0.20.0-cu130` — the original draft (#1057, reverted in #1070, reopened as #1071) carried `v0.19.0` while we waited on a working release; 0.20.0 has now shipped. - Search-space per (ISL, OSL): the existing TP=8 sweep plus a new TP=4 / EP=1 entry covering the lower-TP / expert-parallel variant on the same B300 nodes. Supersedes #1071 — opening fresh from main since the merge base had drifted (b200 schema migrated from `seq-len-configs` to `scenarios.fixed-seq-len`) and the user preferred a clean reopen over a rebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: move kimik2.5-int4-b300-vllm entry to bottom AGENTS.md requires new perf-changelog entries to be appended to the end of the file (oldest at top, newest at bottom). The original commit prepended the new entry above PR #95; move it after the current last entry (PR #1265) to satisfy the convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add DSv4 FP8 H200 SGLang MTP benchmark Mirror of dsv4-fp8-h200-sglang plus EAGLE speculative decoding flags (--speculative-algorithm EAGLE, --speculative-num-steps 3, --speculative-eagle-topk 1, --speculative-num-draft-tokens 4). The (3,1,4) chain matches the dsv4-fp4-b300-sglang-mtp TP-only path. Same image, runner pool (h200-dgxc), and search space as the non-MTP entry. The launcher resolves the new spec-decoding: mtp matrix entries to benchmarks/single_node/dsv4_fp8_h200_sglang_mtp.sh via the framework-tagged + _mtp suffix lookup that landed with SemiAnalysisAI#1264. run_benchmark_serving uses --dsv4 (DSv4-Pro chat framing) per the AGENTS.md rule that all MTP scripts must benchmark against chat-formatted inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: fill in PR link for dsv4-fp8-h200-sglang-mtp Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…emiAnalysisAI#1267) * Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep) - New `kimik2.5-int4-b300-vllm` config with the corresponding `benchmarks/single_node/kimik2.5_int4_b300.sh` launch script (mirrors the existing INT4 B200 vLLM recipe; the upstream vLLM Kimi-K2.5 recipes page does not yet ship B300-specific tuning). - Image: `vllm/vllm-openai:v0.20.0-cu130` — the original draft (SemiAnalysisAI#1057, reverted in SemiAnalysisAI#1070, reopened as SemiAnalysisAI#1071) carried `v0.19.0` while we waited on a working release; 0.20.0 has now shipped. - Search-space per (ISL, OSL): the existing TP=8 sweep plus a new TP=4 / EP=1 entry covering the lower-TP / expert-parallel variant on the same B300 nodes. Supersedes SemiAnalysisAI#1071 — opening fresh from main since the merge base had drifted (b200 schema migrated from `seq-len-configs` to `scenarios.fixed-seq-len`) and the user preferred a clean reopen over a rebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: move kimik2.5-int4-b300-vllm entry to bottom AGENTS.md requires new perf-changelog entries to be appended to the end of the file (oldest at top, newest at bottom). The original commit prepended the new entry above PR SemiAnalysisAI#95; move it after the current last entry (PR SemiAnalysisAI#1265) to satisfy the convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
dsv4-fp8-h200-sglang-mtp, the MTP variant ofdsv4-fp8-h200-sglang(sglang dsv4-pro hopper (rebased on main, with --disable-radix-cache) #1264).lmsysorg/sglang:deepseek-v4-hopper@sha256:7f19c6dc…), sameh200-dgxcrunner pool, same search space (TP=8 EP=1, conc 1 and 4-64 for 1k1k and 8k1k) — search-space entries gainspec-decoding: mtp.benchmarks/single_node/dsv4_fp8_h200_sglang_mtp.shmirrors the non-MTP one with the EAGLE speculative-decoding flags appended:--speculative-algorithm EAGLE--speculative-num-steps 3--speculative-eagle-topk 1--speculative-num-draft-tokens 4dsv4-fp4-b300-sglang-mtpTP-only path.*_mtp.shscripts,run_benchmark_servingis invoked with--dsv4so prompts are chat-formatted (the canonical DSv4-Pro tokenizer ships no jinja chat template, so plain--use-chat-templatewould crash;--dsv4routes throughencoding_dsv4.pyfrom [DSv4] add jinja chat template support #1153)._mtpsuffix logic that already landed inlaunch_h200-dgxc-slurm.shfrom sglang dsv4-pro hopper (rebased on main, with --disable-radix-cache) #1264.perf-changelog.yamlupdated; PR-link backfilled to Add DSv4 FP8 H200 SGLang MTP benchmark #1265.Test plan
dsv4-fp8-h200-sglang-mtprecipe lands onh200-dgxcand produces results🤖 Generated with Claude Code